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ABSTRACT 


t: 


IT’S 


/ 


Assume  that  N mutually  independent  observations  have  been 
taken  from  the  population  specified  by 

P{X1c  M p,  i*  1,2,  ...,N,  j = 1,2,  ... 

where  Xj  denotes  the  ith  observation  and  M^  denotes  the  jth 
class.  The  classes  are  not  assumed  to  have  a natural  ordering. 
Then  the  entropy  is  defined  by 


H = 


log  p 


J * 


The  natural  estimator  H = Pj  log  p^  is  shown  to  have  certain 


J 


j 


deficiencies  when  the  number  of  classes  is  large  relative  to  the 
sample  size  or  is  infinite.  A procedure  based  on  quadrature 
methods  is  proposed  as  a means  of  circumventing  these  deficiencies. 

AMS(MOS)  Subject  Classification*  62G05,  94A15 
Key  Words*  Estimation  of  entropy 
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THE  STATISTICAL  ESTIMATION  OF  ENTROPY 
IN  THE  NON- PARAMETRIC  CASE 

Bernard  Harris 


I.  Introduction  and  Summary.  Assume  that  a random  sample  of  size  N has 
been  drawn  from  a "multinomial  population"  with  an  unknown  and  possibly 
countably  infinite  number  of  classes.  That  is,  if  is  the  ith  observation 
and  Mj  is  the  jth  class,  then 


(1) 


P{X.  c M^}  = > 0,  j = 1,  2, , i = 1,  2, . . . , N 


X 


and  £ p.  = 1 . The  classes  are  not  assumed  to  have  a natural  ordering. 
J=1  1 

In  such  statistical  populations,  the  entropy,  defined  by 


(2) 


30 

H = H(p.,  p ,...)  = - y p log  P, 

12  f=  1 j 


is  a natural  parameter  of  interest.  For  technical  reasons,  natural  logarithms 
will  be  employed  throughout,  rather  than  the  more  customary  base  2 logarithms. 
This  modification  is  equivalent  to  a change  of  scale  and  will  have  no  essential 
effect  on  the  subsequent  discussion.  We  also  assume  throughout  H < ^ . 
Some  examples  for  which  H = are  given  in  Appendix  4. 

Some  concrete  examples  for  which  the  entropy  is  a natural  parameter 
are  the  frequencies  of  words  in  a language  and  the  frequencies  of  species  of 
plants  or  insects  in  a region.  For  such  populations,  the  entropy  may  be  re- 
garded as  a natural  measure  of  heterogeneity.  Many  other  measures  of 
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heterogeneity  depend  on  the  classes  being  nume  *cally  Indexed,  which  is  a 
stronger  assumption  than  having  a natural  ordering. 


We  define  the  random  variables  Y^,  1 = 1, 2....,  N;  j = 1,  £, ...  by 


(3) 


V 


1 if  X,  « M,  , 
1 > 


0 otherwise. 


Then 


= N 


and 


N 


is  the  number  of  observations  in  the  jth  class. 

The  "natural"  estimator  of  H,  denoted  by  ft,  where 


(4) 

and 


A « A 

Pj  log  Pj 


(5) 


J = J,2, 


9 


has  been  studied  extensively  for  the  case  where  the  number  of  classes  for  which 
Pj  > 0 is  known  and  finite.  We  denote  the  number  of  such  classes  by  s in 
this  case  and  assume  that  these  classes  are  indexed  by  1, 2, . . . , s.  Then, 

G.  A.  Miller  and  W.  G.  Madow  [9]  showed  that  the  limiting  (N  -*  ») 


-2- 
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distribution  of  N (H  - H)  is  normally  distributed  with  mean  zero  and 
2 S 2 

variance  <r  = Y p (log  p + H)  , provided  that  not  all  p - i/s  . They 

J^l  3 3 1 

also  showed  that  if  p^  = 1/s,  j = 1,  2, . . . , s,  then  2N(H  - ft)  has  a limiting 
chi  -square  distribution  with  s - 1 degrees  of  freedom.  The  Miller- Madow 
paper  Is  summarized  in  R.  D.  Luce  [7].  An  asymptotic  evaluation  of  E(H  - H) 
is  given  in  G.  A.  Miller  [8].  The  above  results  also  appear  as  special  cases 
of  the  more  general  problem  of  obtaining  the  limiting  distribution  of  the  amount 
of  transmitted  information,  studied  by  Z.  A.  Lomnicki  and  S.  K.  Zaremba  [6], 
Subsequently  G.  P,  Basarin  [1]  also  obtained  the  asymptotic  mean  and  variance 

A 

of  H and  determined  the  limiting  normal  distribution  as  above,  however,  he 
failed  to  note  that  if  p^  = 1/s,  j = 1, 2, . . . , s,  then  s]  N (H  - H)  does  not 
have  a proper  limiting  distribution.  Nets  that  in  this  case, 

s 

Yj  P4(1q9  P.  + H)2  = 0 . 
j=l  J J 

The  paper  by  G.  P.  Basarin  was  subsequently  generalized  by  A.  M.  Zubkov 
[10],  who  permitted  p^,  p^, . . . , pg  and  s to  depend  on  N in  such  a way 
that  for  some  e > 6 > 0,  if 


as  N -*  «o  and  max  (Np  f 1 = 0(  s/N1”6),  then  n/N  (£  - e£)/(2  p log2  p - H2) 
l<j<s  3 3 3 

had  a limiting  standard  normal  distribution.  He  also  showed  that  if  s is 
fixed,  then  2N(H  - H)  has  a limiting  chi-square  distribution  when 
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i -i  • -a 

max  |p  -s  | = o(N  ?)  . In  particular,  note  that  in  Zubkov's  theorem,  he 
1 < j < s 

considered  H - EH  rather  than  H - H and  required  the  additional  condition 

"y ’ '2~  * 

that  s/g  Nrz  log  p^  - H ) •*  0 as  N -*  * in  order  to  replace  EH  by  H 

in  the  statement  of  his  theorem.  This  last  condition  will  be  violated  in  many 

of  the  applications  for  which  the  present  technique  is  intended.  In  Section  2 

A 

we  will  study  the  behavior  of  H ; here  we  observe  that  for  the  problem  at 

A 

hand,  H has  certain  deficiencies.  Roughly  speaking,  if  too  much  of  the 

A 

probability  is  distributed  over  classes  with  "small  p^'s",  H will  not  be  a 
satisfactory  estimator.  A method  for  circumventing  some  of  these  difficulties 
is  given  in  Section  3.  The  alternatives  presented  here  are  arrived  at  through 
intuitive  considerations  and  a detailed  picture  of  their  statistical  behavior 
is  not  available  at  present.  Some  preliminary  empirical  investigations  are 
presented  to  suggest  the  utility  of  the  proposed  techniques. 

A 

2.  Properties  of  H ..  Here  we  present  a somewhat  refined  version  of  some 
of  the  Basarin,  Miller-Madow  results.  The  refinement  is  needed  to  connect 
one  known  error  in  Basarin' s paper  and  to  also  revise  his  computation  of  the 

A 

asymptotic  variance  of  H,  which  is  inadequate  when  p,  = p = . . . = p - l/s 

12  s 

and  p^  = 0.  j > s . 

Basarin  considered  a multinomial  population  with  a known  finite 
number  of  classes,  that  is,  we  have  > 0,  j =•  1,  2, . . . ,s  and  p^  = 0, 

j >s  . For  the  present,  we  adopt  this  assumption.  Then,  expanding  in  a 
Taylor  series,  we  can  write 


-4  - 
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Utmost- 


(6) 


r , ltm-l  s {p^-p^) 


m 


j * . ].*“  * •*  tri  r 

H = H-|i,Pj-p))>o9VZ2^irT  -p. 


+ 1 


where 


(7) 


.kill  f M 

“ rlrlll  Zj 


r+1 


r+1  r(r+l)  ^ 


J 


and 


(8) 


|j  = K.p^  + (1-y  pj?  0<XJ<1  - 


From  (6),  for  fixed  j,  1 < J < s,  we  have 


(9) 


-p4  log  p.  + f>.  log  p.  - Y (~l 


, ,a  a m 

r , ,.n-l  (Pj  -Pj) 


= R 


J v}  T iuy  ^ m(m-l)  nm-l  r+1,  j 


and 


R , = T.  R ,,  , • 
r-*-l  r+1,  j 


Then  for  any  e,  0 < e < 1 and  |p.  - p.  I < (l-e)P.,  we  can  write 


00  / li01'1  <&i  ‘ P|> 

d - y .ill) J L 

r+1,  j L'  m(m-l)  m-1 
m=r+l  P, 


m 


and 


r+1  » 


#1605 
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i}5^^ 


* 


Now  let  A£(pj,  p ) = {£jS  iPj  - Pj  I > (l-e)Pj,  0 < ^ < 1}  . Then  from  (7)  and 
(8) 

. (.iir  <Vp/+1 
VrVu^rU  t ~ 

?i 


and  since  0 < p^  < 1,  0 < ^ < i and  R.+1  ^ = 0 if  and  only  if  pj  - P^  . 
on  Ae(pj,pj),  Rr+lj*0.  Consequently, 


Thus 


v ./LiL '&!]'  ,P)/p, 

1 lLr(r+1)  Vi,)  J 'V  1 1 


) 


Now  A (p4,  p ) is  a compact  set  and  X.  = X (p  ) is  a continuous  function  of 

p on  that  set.  Thus  min  X.(p.)  is  attained  and  is  positive.  Hence 
j A m \ J J 

Pj « AlPj) 

min  min  X,(p  ' = X*  > 0 . Further  note  that  X*  is  independent  of  N . 
1<)<S  1 3 8 


Hence  define 
(10) 


1_ 

£ 3$c  f 

X = min(X  , e ) . 
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To  understand  the  behavior  of  H and  to  motivate  the  subsequent  decision, 

A 

we  proceed  to  obtain  asymptotic  estimates  of  the  mean  and  variance  of  H 
by  employing  (6).  To  facilitate  the  evaluation  of  these  expected  values,  a 
tabulation  of  some  required  auxiliary  formulas  and  some  comments  concerning 
them  are  contained  in  Appendix  1 to  this  paper.  In  fact,  we  provide  some- 
what  more  formulas  than  are  actually  needed,  since  both  the  Basarin  paper 
[1]  and  the  book  by  F.  N.  David  and  D.  E.  Barton  [3,  page  146]  contain  some 
misprints  or  errors,  also  these  formulas  have  frequent  applications  in  problems 
dealing  with  multinomial  distributions  and  hopefully  will  prove  to  be  useful 
in  further  studies  in  the  direction  on  th?  present  paper. 

From  (6)  and  (A.  1. 1-A.  1.  6),  we  have 
,m-l  s 


(ID 


* / * 1 

E»  • H + 1 ;fehr  I — E(<~pi  - vm> * ERr«  • 

m=2  1 ' j=l  Pj 


Then  letting  p (j)  = E{(Z  - Np  )m}  and  noting  that  E{(p  - p.)m}  = (j)/Nm  , 
m j j J ) m 

we  have 


r m-1  s P (j) 

,12>  n“V+ee-' 


From  (7),  (8),  (9)  and  (10),  we  have 


, s l(p  -p)|r+1 

1R  I < (rir+D)'  I -V 


J=1  I 


j 


i s * Pi  ' Pi  ‘ 

< _J__  y _J — L_ 

- r(r+l)  L r r 

j-1  x*  PJ 


ir+1 


"I  • 
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Consequently, 


. s Eiis.-p.r1 

>* 1 K Pi 


and  if  r is  an  odd  integer  > 1 , 


iER  i — l_  y ^r+1(»/^rpJ 

r+l  r(r+l)NM  jtl  r+1  1 


Tnus,  from  (A.  1. 16),  for  r an  odd  integer  >1,  we  have 


(13) 


I E Rr+1 1 = 0(N'(r+1)//2) 


Specifically,  using  (A.  1. 1)-(A.  1.  5)  and  (1 3),  for  r = 5,  we  get, 


? (P)-P.)  ! * (Pt-3P^2P)3) 


l. 


2N  ^ P 
j=l 


,,.2  M 2 

6N  j=l  p^ 


, 2 , 3 4 

s (p  -2p  +p  ) 

1 y ~i — L.  .oir'i 


4N  j = l 


1 


Thus 


(14)  EH  = H + -^-(1  - f ~ ) + 0(N* 2) 


12  N2  j = l Pj 


Next  ve  evaluate  the  mean  squared  error  of  H,  that  is,  E {(H  - H)2  } 


From  (6),  we  have 
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(15) 


m(rn-l) 


s 

l 

k=l 


.A 

(P,. 


P,J 


m 


m-1 


m+i  -2 


.a  tm  a t 

s (Pj-Pj)  (Pk-Pk) 


m=2  I =2  j=l  £l  p^pj1’1 


-2  R 


s ^ r . m-1  s (p.-p.) 

r+1  Z (Pl'Pi'  logp1  + 2Rr+l  Z %/m1 n y - 
r+1jtl  J 1 * r+1m=2  m(m-1)  j=l 


m 


m-1 


j 


We  compute  the  expected  value  of  (15),  employing  (A.  1. 1-A.  1. 13)  and  (A.  1.  20), 
obtaining,  for  r = 3, 


s s 


(16) 


kr(VpiHvpic)io9pjio9pk 


S log  p<p,(j)  log  p log  p p ,(j,  k)  s log  p.p..(j,j) 

T — — + y — j — 7^—^ Y Li! 

j=i  N“  j,  k N2  j=l  N2 


= i (?.  p,  log2  pi  - h2)  , 

j=i  J J 


#1605 
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(17) 


(-i)mtf-z  5 ,s.  jiV^vV 


P y y {mli v y l l K 

m=2  Hz  m«m-1><<|-1>  A &1  p^-1  pf -1 


! |r  ‘l22(,»k)  . v,  ^4(J)  ^ *22°' J) 


4 \ Zj  pp  ' Z 2 ’ Z 

4N  \ j,  k Vk  j Pj 


+ 0(N'3) 


j 


| (3-2s  + s2)  + 3(s  -2  + ^p2'  - (2V  P2  - 2 + s)\  + 0(N“3) 
4N  k ) ^ ) J 


AN 


1J  (s2  - 1)  + 0(N-3)  , 


and 

(18) 


3 m-1  s s logp  (p  -p  )(p  -p  )m 

? r V (-1)  V V J J J k k 

“ Z/.  mdn-ll  M A 


m=2  Jil  151 


m-1 


1 i v,  109 p)^21tk>»  y to«Y3«>  _ v »ogpjM2i«.i) 

+ u p Zj  p 

4 “i  I 


y. 


N3  \j7k  Pk 


V<z 


logp^,.(k,j)  logp.p.(j)  logp.p-,0,  j) 


j 31 


4 'S  U 2 

3N  U,k  pk 


l 


j 4 


y_3V 


v 2 4-J  2 

j p,  pj 


4r  ( X log  p + sH)  - —•  (sH  + Yj  log  p ) + 0(N-3) 
N j J N j } 


0(N~3)  . 


We  now  consider  the  tnree  terms  in  (15)  which  contain  R as  a factor.  To 

4 

consider  the  first  of  these  terms,  we  write 


-10- 
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(19) 


r4|,  (Pj-PjIlogPj  = Z(VPj,logpil  S,Tr- 


(P.-P,)' 


V (-1)  k k 
h.  20  4 


k=l 


+ M • 


The  expected  value  can  easily  be  estimct«d  using  (A.  1.  5),  (A.  1. 6),  (11),  (A.  1.7), 

(A.  1.2),  the  Cauchy -Schwarz  inequality,  (A.  1. 16)  and  (A.  1. 2C).  We 

obtain 


(20) 


:R4  t (P«-Pi>  lo9 Pj  = 0(N‘3) 
j=l  J J 3 


The  extensive  computation  indicated  in  (19)  appeared  to  be  essential,  since  a 
direct  application  of  the  Cauchy -Schwarz  inequality  yields  an  estimate  of 
DIN’5/2,  . 

Similarly,  from  (11),  (A.  1. 20),  and  (A.  1. 16),  it  follows  readily  that 


(21) 


l i nm“l  1 (P*  - Pg) 


fi 


(-1) 


4m=2  m<m‘1>  jtl 


Combining  (16)-(21),  we  obtain 


(22)  E(A-H)2  = ~ f P.log2p  -H2]  + -i7(s2-l)  + 0(N‘3)  . 

, 1=1  3 3 4N 


From  (14)  and  (22),  we  obtain 


2 /V  2 /v  2 1 fry 

(23)  <rg  = E(H-H)  - (EH  - H)  = ^ \ Z Pj 


, 2 2 
log  p^  - H 


)*-*T 
7 2N 


+ 0(N*3) 


#1605 
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The  preceding  discussion  enables  us  to  observe  a variety  of  short- 


comings, when  one  employs  H as  an  estimator  in  the  more  general  situation 
described  in  section  one. 

A 

First,  from  (14),  we  see  that  the  bias  of  H depends  on  s,  the  number 

A 

of  classes.  If  s is  known,  the  bias  can  be  largely  removed  by  replacing  H 

A 3 — 1 

by  h°wever>  we  have  assumed  that  s is  unknown.  Secondly,  it 

should  be  noted  that  the  bias  increases  with  s . Thus  if  we  permit  s to 
grow,  or  if  s is  unknown,  the  bias  may  be  large.  In  particular,  we  are 
interested  in  the  case  where  s may  be  of  the  same  magnitude  as  N . In 
this  case,  we  would  have  to  regard  s = s(N)  and  p^  = p^h)  . However, 

A 

from  (22)  or  (14),  it  is  apparent  that  H - H will  not  generally  tent  to  zero  in 
probability.  Intuitively,  it  too  much  of  the  total  probability  is  concentrated 
on  cells  that  are  too  small,  then  H will  not  be  a satisfactory  estimator. 

A 

In  the  examination  of  the  properties  of  H,  we  found  it  desirable  to 

-2 

extend  Basarin’s  computations  to  terms  of  0(N  ) . This  is  desirable 

whenever  p.  = 1/s,  i = 1,  2,  . . . , s . In  that  case, 


3 2 2 
G(Pj,  P2,  • • • , P3)  * X Pj  log  Pi  - K 


1 2 2 
= s — log  s = log  s = 0 


2*2 

and  a useful  asymptotic  estimate  of  <r„  or  E(H  - H)  is  not  obtained. 


-12- 
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v:„. 


jtlf-Tir 


In  summary,  if  s is  known,  or  known  to  be  bounded  {independent  of  N) 
or  if  the  total  probability  of  "small  classes"  is  known  to  be  small,  then  5 
will  have  satisfactory  properties.  In  Appendix  2,  the  maximum  of 
G(p,,  p , . . . , p ) is  obtained.  This  can  be  utilized  in  determining  the  sample 
size  necessary  to  obtain  a specified  mean  squared  error  when  s is  known 

A 

and  H is  used  as  the  estimator  of  H . 


3.  Quadrature  methods  of  estimating  H ♦ Let 


30 


(24) 


R(p 


,P2>...)=  y NPje 


■Np, 


We  define  the  distribution  function 


-Np. 


(25) 


F(x)  = Yj  nP<  e /R  (P,,  P2,  • • • ) • 


Np^  < x 


Then,  it  follows  that 

R(Pi»  P2»  • • • ) N „ , * Np  -Np 

(26)  — / e log(^)  dF(x)  = yj  V e ’ iog)^)  NPj  e 


00 


-V  p log  p.  = H . 
j=l  1 1 


#1605 
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Thus,  it  is  clear  that  if  we  knew  p^,  p^,  . . . , we  would  therefore  know 
F(x)  and  consequently  know  H . The  procedure  is  to  use  the  data  to  obtain 
an  estimate  of  F(x)  and  thus  to  obtain  an  estimate  of  K,  which  we  denote 
by  H . 

Specifically,  we  propose  to  write  (26)  in  the  form 

N 

(27)  H = / g(x)  dF(x)  , 

0 


and  to  estimate  H by 

d 

(28)  H = V g(x  )w  , 

i=l 

the  points  x^  and  the  weights  w^  are  to  be  determined  from  the  data.  We 
now  proceed  to  the  construction  of  quadrature  formulas  of  the  form  of  (28). 

Let  rr  be  the  number  of  cells  occuring  r times  in  the  sample. 
Trivially,  we  have 

N 

(29)  Y r nr  = N • 


From  Appendix  3,  we  have  that 

* (Np  / -Np 
(30)  En  ~ V — e 3 

r j='l  r! 


r = l,2, 


• , k 


J 


where  k does  not  depend  on  N . The  reader  should  refer  to  the  appendix 
for  details  concerning  the  sense  in  which  the  symbol  is  used  here.  The 
moments  of  F(x),  denoted  by  are  given  by 
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(31) 


N -Np 

/ x dF(x)  = ?,  <Np  )r+l  e 1 /R(p  p ) 

0 j 3 1 c 

~ (r+1) ! E(nr+1)/E(ni)  . 

The  observed  values  of  n may  be  regarded  as  estimates  of  E(n  ) 

r r 

whenever  #0  . In  this  case,  we  can  regard 

(32)  mf  = (r+1) ! n^/nj , r=l,2,  ...,k 

as  estimates  of  the  first  k moments. 

We  proceed  as  follows.  If  n^  = 0,  estimate  H by  (4).  If  * 0 , 
select  k and  determine  m^,  m^, . . . , m^  . Using  these  as  estimates  of  the 
moments,  we  seek  to  determine  a distribution  function  whose  first  k moments 
are  m^,  m^, . . . , m^  . Unfortunat -ly,  it  may  happen  that  the  "sample  moments" 
ml»  m2»*  * • » mj,  are  Inconsistent.  That  is,  since  these  are  estimates  of  the 
moments  of  (25)  and  subject  to  sampling  fluctuations,  it  is  possible  that 
there  is  no  distribution  function  on  [0,  Nj  with  m , m . . . , m as  its  first  k 

1 C JC 

moments.  Consequently,  we  compare  m^m^.-.m^,  I 'k  with  the  con- 
sistency conditions,  which  may  be  found  in  B.  Harris  [4] ; the  simplest  of 
«.hese  conditions  is  m^  > . If  m^,  l < / < k,  is  the  last  moment  estimate 

which  satisfies  these  conditions,  we  employ  ny  m^, . . . , m^  in  determining 
Fj  (•*'),  the  estimator  of  F(x)  used  in  determining  H . 

From  (31)  and  from  Appendix  3,  we  can  easily  see  that  it  is  the  "small 
probabilities"  that  contribute  to  Enf,  r = 1, 2, . . . , k and  thus  an  estimator 


#1605 


-15- 


of  F(x)  constructed  in  this  manner  will  use  mainly  the  information  contained 


ir  the  "small  p^'s"  . For  the  " large  p^'s",  the  estimation  of  p^  by  p^  is 
satisfactory.  To  estimate  the  part  of  the  data  that  should  be  assigned  to 
"large  p/s",  the  following  procedure  is  followed.  Once  (x)  is  determined, 
we  compute 


(33) 


u.  (T 
V f 


>■/ 


N 


xr  dF{(x), 


r = i +1,  t+Z,  . . . 


from  which,  we  obtain,  using  (31), 

(34)  0r+1  = Pr(Ff)  ^/(r-l)!,  r = 1+1,  1+2 


From  these  estimates,  we  define 


(35) 


if 


n 


r+1  r+1 


> 0 


otherwise. 


wf,j  provides  an  estimate  of  the  contribution  to  the  occupancy  numbers 

A 

accounted  for  by  the  "large  cells",  that  is,  not  included  in  F^(x)  . A 
further  modification  is  necessitated  in  th*  case  of  Gauss  quadrature  formula, 
which  will  be  discussed  subsequently. 

Thus  combining  the  heuristic  arguments  given  above,  we  obtain 

(36)  e log(x  )dFf(x)  ■ l ~log(“ir')  ’ 

' 0 k>/ 

which  is  easily  seen  to  have  the  form  (28). 
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To  amplify  and  illustrate  the  above  principles,  we  proceed  by  using 
the  Gaussian  quadrature  formulas,  which  are  the  simplest  to  employ. 

A 

Then  we  have  for  F (x),  i 1,  2,  3,  the  following: 


(37) 


(38) 


Fj(x)  = 


X < IHj  , 

m^  < x ; 


F?(x)  = / 


(N-m^ 


2 2 
(N-mp  +(m2-m1) 


x < 


Nm^-m^ 
N-m^  ' 


Nm 

1 2_ 

N-m, 


< x < N , 


1 


x > N ; 


x < 


m z + m -m 

1 Lm  X 


(39) 


*,(*>  = < 


2 2 
z +m2~mj 


m j z + - m^ 


< x < rrij-z 


v. 


m^-z<x  , 


where 

(40) 

and 

(41) 


-M. 


J 


z - 


2 2 6 
M3  + 4(m2-m1 ) 


2(m2  -mx) 


- 3mj(m2~m^ ) 


m 


3 

1 ’ 
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The  Gauss  quadrature  formulas  listed  here  have  the  attribute  that  for 
f an  even  integer,  positive  probability  is  placed  at  N . Thus  as  N -*  » , 
this  provides  an  asymptotic  lower  bound  for  K (see  E.  B.  Cobb  and  B.  Harris 

A 

[2]).  Simultaneously,  the  use  of  (34)  provides  overestimates  for  . In 

A 

odd  values  of  I,  the  use  of  F minimizes  the  higher  moments,  suggesting 
that  this  will  account  for  the  information  contained  in  the  "small  p/s"  in  a 
reasonable  way.  Accordingly,  in  the  examples  that  follow,  we  have  used  the 
minimum  values  of  the  moments  in  (34),  feeling  that  this  will  be  appropriate. 

A 

Thus  (34)  and  Ff(x)  for  odd  value  of  f are  to  be  regarded  as  providing  the 
estimates  we  seek.  We  report  the  results  for  even  values  of  i as  well  in 
the  numerical  examples  that  follow  for  purposes  of  comparison.  The  apparent 
negative  bias  is  co  be  noted  in  each  example. 

We  now  turn  to  some  numerical  examples  to  clarify  the  preceding  dis- 
cussion and  to  provide  numerical  comparisons  for  purposes  of  justifying  the 
proposed  technique  and  the  heuristic  arguments  which  suggest  it. 

4.  Numerical  examples.  The  examples  which  follow  are  intended  to  provide 
comparisons  between  H and  . We  present  these  ir.  substantial  detail 
with  extensive  discussion  so  that  the  ideas  and  computational  procedures 
are  clear.  Some  are  artificial  in  the  sense  that  expected  values  are  employed 
instead  of  "random  data".  This  has  the  following  purpose  - if  the  techniques 
described  here  perform  poorly  when  the  data  is  "perfect",  then  it  should  do 
even  worse  when  random  fluctuations  are  imposed. 
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Example  1.  pj  = ~,  1 = 1,  2,  3, 4,  z{  = 30,  ^ = 27,  = 21,  Zj  = 22  , 

N = 100,  H = log  4 = 1.  38629,  H = 1.  37  556. 

From  (14),  we  have  that  EH  ~ 1.  37117,  and  from  (25),  cr^=  .00015 

A 2 

and  E(H  - H)  = .00037  5.  Note  that  if  s is  assumed  known,  we  can  improve 

A _A  S -1 

H by  correcting  for  the  bias,  obtaining  H + = 1.  39056. 

2N 

-3  3 

Example  2.  p^  - 10  , j = 1,  2, . . . , 10  , N = 100  . In  such  a popu- 

A 

lation,  H should  not  perform  too  well,  since  the  cell  probabilities  are  all 
very  small  compared  to  N . Here  H = 6.  90776.  Thus  type  of  population  is 
very  favorable  to  the  quadrature  method,  since  F(x)  is  a degenerate  distri- 
bution with  probability  one  at  Np^.  = . 1 and  is  therefore  completely  determined 
, 2 3 

by  Pj  (that  is,  1^=1^,...).  The  data  is  ^=85,  n?  = 6, 

n3  = 1 • Thus,  ml  = . 14118,  m2  = . 07059.  Further  note  that  H=  4.48903, 

A 

also  we  always  have  H < log  100  = 4.  60517. 

For  k = 1,  we  have  w3  = .71765.  Thus  Hj  = 6.49982  . For  k = 2 , 
we  have  H = 6.42456.  We  are  not  able  to  proceed  to  k = 3,  since  n =0 

c 4 

insures  that  the  consistency  conditions  for  m^,  m.,,  m^  to  be  a valid  moment 
sequence  on  [0,  N]  are  not  satisfied. 

The  estimates  Hj  and  H2  are  lower  than  H . However,  this  is 
precisely  as  it  should  be,  since  En ~ 90,  En„  ~ 4.  5,  En  ~ .15  and 
thus,  as  a consequence  of  sampling  fluctuations,  the  data  looks  as  if  it 
came  from  a distribution  which  does  not  have  equal  probabilities  for  all  cells. 

Example  3.  p{  = 10-3,  i = 1,  2, . . . , IQ3,  N = 1000,  ^=  373,^  = 199, 
n3  * 62,  n4  = 8,  n =1,  n&  = 1,  H = 6.  90776. 
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For  this  data  H = 6.  36438,  m = 1.  06702,  m2  = . 99732,  m3  = . 51475  , 

m . = . 32172,  m_  = 1.  93029. 

To  compute  H^,  we  first  set  k - 1,  obtaining  = 0,  = 0,  w^  = 0 , 

= .2834  5.  Thus,  we  get 

Hj  = 7.42779  . 

2 

Since  m < m. , the  process  terminates  here.  Here,  the  overestimate 

C i 

is  precisely  what  one  would  expect  from  the  data,  since  En{  ~ 368  . The 
observed  value  of  n^  suggests  a larger  number  of  celjs  than  are  actually 
at  hand. 

Example  4.  This  example  is  identical  with  Example  3 except  that 

n,  = 341,  n_  = 179,  n = 70,  n = 17,  n.  - 2,  r.,  =1.  n=  1 . Then 

I l.  3 4 d b ( 

H = 6.  29417,  m,  = 1.  04985,  m,  = 1.  23167,  mo  = 1.  19648,  m.  = . 70381, 

1 2 3 4 

m.  - 2.11144,  m.  = 14.78006. 

5 o 

For  k = 1,  we  have  w^  = 7.  3587  5,  - . 558^7,  w^  = 0,  w^  = . 39596, 

w^  = .90941  and  hence  = 6.  86725. 

/ 

For  k = 2,  Wj  = 0,  i —4,5,  w^  = . 05808,  w_,  = , 84214,  = 6.71320.- 

We  are  unable  to  proceed  to  H^,  since  the  sequence  nym^m^  is  not 
a realizable  moment  sequence. 

We  now  choose  an  example  fcr  which  F(x)  is  again  a one-point 
distribution,  but  since  N p^  = 2,  the  n^'s  will  be  non-zero  for  larger  value 
of  J . 

Example  5 . p.  = 2/1000,  i = 1,  2,  ... , 500,  N = 1000,  n = 139,  n2  = 146, 

n,  = 78,  n.  = 42,  n,  = 21,  n,  = 5,  n_  = 2,  n0  - 1,  n, . = 1 . Then  m,  = 2. 10072  , 

3 4 5 6 7 810  1 
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2 

m = 3.  36691,  m = 7.  25180,  m = 18. 12950.  Here  m2  < m , so  that  we 
compute  Hj  • H = 6.21461  and  H = 5.9257.  We  obtain  = 7.06270  . 

Example  6.  Let  = P2  = P3  = P4  = 1/8,  Pi  = 1/2  10  , i = 5,  6, . . . , 1004, 

N = 200.  The  data  obtained  is  nj  = 86,  n2  =4,  n§  = 1,  n23  = 1,  n24  = 2,  n3Q  = 1. 
For  this  population  H = 4.  84017.  From  the  data,  we  have  H = 3.  59686  and 
Hj  = 4.75552. 

The  following  examples  are  artificial  in  the  sense  that  instead  of 
random  data,  the  expected  values  of  the  n^  are  employed  for  "small"  values 
of  r . 


Example  7.  We  are  given  2000  cells,  1000  of  which  have  p = 1/4000 


and  the  balance  of  which  have  pt  = 3/4000.  Two  thousand  observations 


are 


taken.  We  will  examine  the  behavior  of  H and  as  if  the  n^’  s were 

exactly  equal  to  En  . Such  examples  serve  to  illustrate  the  motivation  for 

the  quadrature  method.  In  this  example  En^  = 548.97  51,  En^  = 157.1906, 

En3  = 35.2414,  En4  = 6.  3543,  En5  =.9404,  E n&  =.  1171,  E n?  = . 0125  , 

En  = . OOli,  H = 7.47009,  and  H = 6.  52939.  Thus,  even  with  the  use 

En  in  r^,  H has  a sizeable  negative  bias.  On  the  other  hand,  FT"  7.41776, 

H = 7.  28016,  and  H = H . This  last  occurs  since 
2 3 

r 0 x < . 25 

F(x)  = ^ . 35466  .25  < x<  .75 

1 x > .75 
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That  is,  F(x)  is  a two-point  distribution  and  it  can  easily  be  seen 
that  every  two-point  distribution  is  uniquely  characterized  by  three  moments. 
Thus,  if  = En^,  it  follows  that  F^(*)  - F(x)  and  = H . 

Example  8.  This  example  is  extremely  artifical,  but  serves  neverthe- 
less to  illustrate  one  of  the  possible  boundary  situations  which  clar<fy  the 

A w 

differences  between  H and  H . Assume  that  we  are  sampling  from  a 
probability  distribution  that  is  absolutely  continuous  with  respect  to  Lebesgue 
measure  on  the  real  line.  Every  real  number  is  considered  to  be  a separate 
class.  Then  n^  = N with  probability  one.  Here,  one  should  define  H = °°  , 

A '' 

H = log  N and  H = *>  . 

Example  9 . The  Zipf  Distribution.  A common  mathematical  model 
for  describing  linguistic  as  well  as  other  data  is  the  Zipf  distrioution  given 
by 

(42)  Pj  = (^(s)f1  j'3  s >1,  j = 1,2,...  , 

where  t,(s)  denotes  the  Riemann  zeta  function.  This  distribution  is  suited 
for  a test  of  the  quadrature  estimates  proposed  in  this  paper,  since  for  "small 
values"  of  t, , there  are  both  classes  with  large  probabilities  and  a sub- 
stantial concentration  of  the  total  probability  in  small  cells.  For  a specific 
numerical  Must™ -.ion,  we  will  take  s = 3/2  .. 

We  evaluate  H and  E(n^)  for  the  Zipf  distributions  by  means  of  the 
Euler-MacLaurin  formula,  which  is  particularly  suited  for  this  case. 

First  ws  have 
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(43) 


oo 

H = -Y  p log  p 
j=l  J 


= J(jSUs))_1(s  log)  + log  Us)) 


oo 


log  Us)  + s(Us))’1  Y ■ 

j=l 


r-^  g 

We  employ  the  Euler-Maclaurin  formula  to  evaluate  2j  logj/j  • 

j=l 

complish  this  we  write 


To  ac- 


ao  M-l  oo 

Y = Y lo9  j/JS  + Y lo9  j/)s 

j=l  j=l  )=M 


Then 


(44) 


I logj/js  = j"*iogJ/iS  ♦ M-S+1  OfM.  + -L-J 

1=1  1=1  ' s‘*  (s-1  f ' 


log  M "rM  2Zv  d2t,~* 
ZMS  'v=l  {Zv)i  dM2v- 


-r  (_1o£M\  + R (M) 

•Mm5  J m 


where  are  the  Bernoulli  numbers. 

We  can  similarly  estimate  E(nr),  r = 1,  2, 


. That  is, 


E(n  ) ~ Y (NP,)re'^Pj 

j=l  1 


00 

-h  l {■ 

i=l  v 


N/Us)jS]  e'N/C(s)J 
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Here  ♦he  Euler-Maclaurin  formula  is  also  applicable  and  we  obtain, 


using  only  the  initial  term  of  the  expansion 


(45) 


E(nf) 


~ N1//sr(r-s~S 


(Us)) 


1/5  r! 


We  now  apply  this  to  a specific  numerical  illustration  setting  s 
N = 1000. 

Thus,  we  have,  for  i = 1.  9, . . . , 10 


E(nj)  ~ 

E(n2)~ 

E(n3)~ 

E(n5)~ 

E(n6)~ 

E(n?)~ 
E(n8)~ 
E(n9)~ 
E(nlc?  ~ 


94.15584 
15.  69264 
6.  97451 
4. 06846 
2.71231 
1. 95889 
1.49249 
1.18155 
. 96275 
. 80229 


get 


To  determine  H,  we  note  that  using  (<H)  with  M = 4 and  m = 


i .-i/<3/2  ■ i -1/2  (tT  + m) 


j=l 


1=1 


log  4 


1 B 


2-4 


3/2 


-l 


2 v 


,2v-l 


(2v>) ! ,.,2v-l 

v=l  ' ' dM 


/ log  M \ i 

(m^)! 


U-4 


= 3/2  , 


3 we 


+ RZ(M)  . 
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= ,45649  3.  38629  + . 08664  + . 00281  + R (M)  , 

w 

where  | R2(M)  I < 2.  5 X 10  ^ . Thus  it  follows  that  we  can  write 


00 

.T"\ 


H = log  U3/?)  + (3/2U3/2))(£  logj/j3/2) 


~ 3.21811  . 


We  make  the  assumption  that  for  each  j,  j = 1,  n^  = E(n^)  . 

A w 

Tliis  enables  us  to  compare  H and  when  the  data  arc  perfect,  that  is, 
the  data  is  completely  devoid  of  sampling  errors.  Given  this  artificial 
assumption,  we  have  H = 2.  82871,  = 3.  00146,  = 2.  84048,  H3=  3.  03918. 

V V 

Specifically  H,  H 2>  and  H ^ were  computed  here  as  follows  in 

order  to  obtain  a reasonable  comparisor  of  their  behavior.  For  p^,  p^, . . . , 
the  assumption  that  p^  = p^  was  made.  This  was  employed,  since  when  p 
is  large,  both  techniques  gives  virtually  the  same  result.  The  remaining  p's 
were  distributed  according  to  their  contributions  to  E(nf),  as  in  the  pre- 
ceding examples.  For  detailed  information  about  the  Zipf  distribution  and 
extensive  references  to  articles  about  the  distribution  and  its  applications, 
see  N.  L Johnson  and  S.  Kotz  [5,  pp.  240-247]. 

5.  Concluding  Remarks.  The  estimator  h described  in  the  preceding  sections 
is  to  be  regarded  as  a first  attempt  to  produce  an  estimator  which  can  circumvent 

A 

the  deficiencies  of  the  natural  estimator  H . The  procedure  is  by  no  means 
completely  analyzed  and  it  is  hoped  that  this  work  will  stimulate  further  in- 
vestigations into  its  behavior. 
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The  following  remarks  are  therefore  relevant.  We  have  chosen  the  Gauss 


' i 
. ! 


quadrature  formulas,  because  they  are  among  the  simplest.  If  we  fail  to 
analyze  completely  the  problem  for  Gauss  quadrature  formulas,  then  we  are 
unlikely  to  be  successful  in  more  complicated  situations.  The  selection  of 
k produces  problems,  since  for  greater  values  of  k,  we  utilize  more  information 
from  the  sample ; however,  the  higher  moments  are  less  reliable  statistically. 

2 

Thus,  a way  of  balancing  these  two  properties  is  needed.  Second,  if  m^  < , 

2 

we  have  set  . However,  we  could  also  have  increased  m^  to  n/  m^  , 

or  chosen  any  alternative  in  between.  Here  again,  further  investigation  is 
needed.  The  same  remarks  apply  to  the  determination  of  w^  (35)  . The  pro- 
cedure that  we  have  used  provides  a sequence  of  quadrature  formulas  which 
give  better  estimates  as  we  increase  k,  when  the  data  are  perfect,  that  is, 

Eni  = °i’  i ~ Z’  ’ ‘ ' ’ T*lis  iS  an  at*  k°C  Procedure  and  has  not  taken  ade- 
quate account  of  sampling  fluctuations. 

The*e  are  two  sources  of  errors  in  the  quadrature  methods.  Quadrature 

x N 

formulas  of  the  Gauss  type  integrate  polynomials  exactly,  but  e log  — is 
not  a polynomial.  Secondly,  we  are  aggregating  the  "small  p^’s"  and  treating 
them  as  if  they  possessed  relatively  few  values,  whereas  they  are  in  general 
distributed  over  a .«gion.  This  is  a form  of  smoothing,  whose  properties 
are  not  completely  understood  at  this  time. 

Further  work  in  this  direction  is  being  continued  by  myself  and  my  students 
and  we  hope  to  be  able  to  report  further  results  in  this  direction  in  the  near 
future. 
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Some  formulas  and  relationships  for  multinomial  distributions 


In  the  evaluation  of  (10)  and  (22),  the  central  moments  of  the 

multinomial  distribution  have  been  used.  As  a convenience  to  the 

reader,  they  have  been  tabulated  here,  along  with  some  identities 

and  inequalities  which  have  been  used  to  obtain  order  estimates . 

jl  j2  JS 

We  denote  E{(Zr  - Npj)  L&z  - Np2)  c ...  (Zg  - Npg)  s} 

by  (i  . for  every  1 < J.  < °°,  1 < s < °o 

JlJ2,,,Js  1 


(A.  1.1) 


“i  = ° 


(A.  1.2) 


(A.  1.  3) 


P2  = N(Pj  * pf) 

U3  = N(p,  - 3p2  + 2pj ) 


(A.  1.4) 


U4  = 3N2(p2  - 2p3  + p4)  + NfPj  - 7p2  + 12p3  - 6p4) 


(A.  1.5)  u5  ^ lON2(p2  - 4p3  + 5p4  - 2Pj ) + N(Pj  - 15p2  + 50p3  - 60p4  + 24P*) 
(x6  = N3(l5p3  - 45p4  + 45p3  - I5p*)  + 


(A.  1.6) 


+ N2(25p2  - 180p'  + 41 5p4  - 39 Op 3 + I30p*) 


+ N(Pj  - 3lp2  + 180p3  - 390p4  + 360P3  - 120p^) 
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(A. 1.7) 


P11  = -NP1P2 


(A.  1.8) 
(A.  1.9) 
(A.  1.10) 


l = N(2pip2-pip2) 


*111  = 2NP1P2P3 


V-.j  = 3N2(p13p2  - p2p2)  + N(-6p3p2  - ptP2  + 6p2p2) 


(A.  1.11) 


..2.,  2 2 2 2 . 
^22  = N (3P1P2'P1P2-P2P1+P1P2) 


+ N«-6PlZp^  + 2Pl2p2  + 2Plp2  - PlP2l 


(A.  1.12) 
(A.  1.13) 


p211  = N2,3PfP2P3  ‘ P1P2P3I  + N'-6Pl2P2p3+  ^P^j' 


P Ull  = 3N  P1P2P3P4  - 6NP1P2P3P4  ' 


These  formulas  may  be  obtained  by  completely  elementary 
methods.  Further,  a number  of  these  are  given  in  G,  P.  Basarin  [1] , 
although  is  incorrectly  stated  there.  Similarly,  all  of  the  above 
with  2 < 4 may  be  found  in  F.  N.  David  and  D.  E.  Barton  [ 3, 

p.  146],  although  is  incorrectly  given  there. 

From  (A.l.  1)-(A.1.6),  we  note  that 

(A.  1.14)  ^2  = 0(1/),  ^Zr  l = On/-1),  r = l,  2,  3. 


From  the  well-known  recursion  = 1, 
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(A.  1.15)  ^r+1  = (Pj  - PjMNr^  + ” nr),  r = 1,  2, 


it  follows  that 


(A.  1.16) 


H2r  = 0(1/),  = 0(Nr_1),  r = 1,  2,  . . 


We  get  order  estimates  for  the  product  moments,  that  is, 

those  indexed  by  more  than  one  subscript  by  use  of  repeated 

applications  of  the  Cauchy -Schwarz  inequality.  Specifically,  let 

q > 1 be  an  integer.  Then  for  arbitrary  random  variables  W^,  W^, 

. . . , W such  that  the  moments  given  below  all  exist,  we  obtain 
2q 

2q  2<I  2q 

(A.1.17)  (E(W  W ...W  ))  < TT  EW 

1 2q  i*l 

When  q = 1,  this  is  the  customary  form  of  the  Cauchy -Schwarz 

inequality.  To  apply  (A.1.17)  to  the  situation  at  hand,  we  define 

W.  = (Z  - Np  ) * and  if  in  p , , 2q  * < s < 2q,  q > 1,  then 

i )])z  s 

then  we  define  W = 1,  i = s+1,  s+2,  ....  2q.  We  write  (A.1.17) 


in  the  form 


(A.  1.18) 


.£  2<V 

I E(W  W . . . W ) I < \ TT  EW 

1 L 2q  U=1 


Thus,  for  2q_1  < s < 2q,  q > 1 


i,  K f \2<lj/ 

(A.  1.19)  lEffZj-Npj)  A(Z2-Np2)  ^...(Zs-Nps)  s}  | <^TiE^Zi-Np^ 
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» t » 


, s and  hence 


From  (A.  1.14), 


E(Z4 


■Npt) 


2qj, 


= 0(N 


*"lJ. 


),  i = l,  2, 


-V  2-J  q-1  J J 

|T  E(Z-Np  ) 1 = 0(1*  i=l  1 ) . 

1=1  1 1 


Thus 

[ S 1/2] 

(A.  1.20)  p.  . . =0(N  i=1  ) ; 

r2*  * * s 


the  integer  part  is  a consequence  of  the  fact  that  N can  only  appear 


in  integer  powers . 
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Appendix  2 

The  Behavior  of  the  Asymptotic  Variance  and  Mean  Squared  Error  of  . In 
this  appendix  we  shew  that 

s 2 2 

(A.  2.1)  max  G(Pj,  p£, . . . , pg)  = max  (£  P.log  p -H  ) 
p2-  • • • * ps  Pj,  Pr  • ■ • ,PS  J=1 

* * . * 2 
= 4p  (i-p  )/(i-2P  r , 

♦ 

where  p is  the  largest  solution,  in  p of 


J-LuU 

V*55-1)?/ 


l-2p 


This  can  be  used  to  specify  the  sample  size  N necessary  to  obtain  estimates 
of  H cf  a given  precision  when  using  # and  therefore  is  of  importance  when 
s is  bounded  ; or  mere  precisely,  when  s/N  is  sufficiently  small.  The 
minimum  of  G(Pj,  p^, . . . ) = 0 and  is  trivially  attained  when 


V° 


i € I , I C {1,2, . . . , s)  , 


I _C  I - 1 . Tc  c , 

Pj  = ll  I , i € I , I *4 


This  is  easily  verified  as  follows,  since  then 


t P<  lQ9  Pi  = L pi  log2  Pi 
j=l  3 1 j€l°  J 3 


= iiciiicr1io,2iici 
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Further,  in  this  case 


H = -v  p^iogp  = |lC! 
i=i  J 


I IC 1 ” 1 log  | IC 1 


9 


thus  verifying  the  assertion. 

To  determine  the  maximum,  we  first  note  that 

(A.  2.2)  G (s)  = max  G(p.,...,p)>  max  G(p.,...,p  , 0) 

I S 1 S-i 


max  G(p  , . . . , p ) = G (s-1)  . 

— . * S -I 

pr  “•,ps-i 


s-l 


Now  let  p 


- 1 - S p, 

i=i  1 


and  note  that  for  j = 1,  2,  . . . , s-l 


9G(p  . . . , p ) 

(A.  2.  3)  — = (log  p - log  pg)(log  p + lr  J pg  + 2 f 2H)  . 

P. 

Setting 

9G(p  , ...,p  ) 

(A.  2.  4)  ^ =0,  j = 1,2,...,  s-l  , 

we  note  that  in  each  equation  wo  must  have  either  log  p^  - log  pg  = 0 or 

log  p + log  p + 2 + 2H  = 0 . Clearly  if  log  p.  = log  p for  j = 1,  2, . . . , s-l, 

] s j s 

we  have  p = 1/s,  j = 1,  2, , s and  G(p.,  p , . . . , p ) = 0,  a minimum. 

j x.  c.  S 

Hence  there  must  be  at  least  one  j with  log  p.  + log  p + 2 + 2H  = 0 . Since 

) s 

for  any  solution  of  (A.  2. 4),  any  permutation  of  the  indices  1,  2, . . . , s is 
also  a solution,  with  no  loss  of  generality,  we  can  set 
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log  p.  + log  p + 2 + 2H  = 0,  j = 1,  2, . . . , t;  1 < t < s-1  ; 
j s — — 

then  we  have 


P,  P„  = e'2_2H,  j = l,2,,,.,t  , 


(A.  2.  5) 


1 s 


Pj  --  K ’ 


j = t+1,  t+2,  . . . , s-1  , 


the  set  of  indices  j for  which  p.  = p possibly  being  empty.  From  (A.  2.  5), 

J s 

we  have 


(A.  2.  6) 


t 

Tj  P,  Pr  = P_(l  - (s-t)p  ) = te 
•f,  ] s s s 

J-i 


-2-2H 


For  fixed  t,  let  H beany  H in  the  solution  set  of  (A.  2.  6).  Then, 

p = p (H  ) has  at  most  two  solutions,  say  p (H  ) and  p <H  ) . Thus, 
s s St  S £ 

* ❖ 

from  (A.  2.  5),  we  have,  for  every  H and  every  p (H  ),  1 = 1,  2, 

S 1 


(A.  2.  7) 


Pk(lT)  = p (K  ),  1 < j,k  < t , 


Pk(H  ) * Psi(H  ),  1 < k <t  . 


Thus  every  solution  to  (A.  2.  5)  has  the  form 


(A.  2.  8) 


l-(s-t)p 


P<  = 


j t 


pr  ps 


j = l,  2,  ...,t  ; 
j = trl,  . . . , S-1 


This  yields 
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P! 


[l-(s-t)p 


H = -(s-t)  Pg  log  pg  -(1  - (s-t)pg)  log| 


Substituting  this  Into  (A.  2. 6),  we  obtain 


ic; 


(A.  Z.  9) 


. . ..  . l-2(s-t)p 

e2  _ | ' 3 


tP 


s 


, o < P < (s-t) 

" S 


ML  i 


Now  the  logarithm  of  the  right  hand  side  of  (A.  2. 9)  is  s convex  function  of 

p which  assumes  the  value  + « at  p = 0 and  p = (s-t)"*  and  the  values 
s s s 

0 at  p = 1/s  and  l/2(s-t)  . Thus  there  are  exactly  two  solutions  of  (A-  2.9), 
s 

p^  and  pg2  with  0 < pg^  < 1/s  < — j < pg2  < l/(s-t)  . As  a consequence 

of  the  preceding  discussion,  we  have  that 

* Ms-t)p  l-(s-t)p  . 

(A.  2. 10)  G (s)  a max  max  G( r r , 

l<t  < s-1  i=l,  2 t si  st 


max  max  G,(t,  p ) . 

>i  1 i 1 *1  * SI 


l<t<S-i  1=1,2 


l-(s-tp 


si, 


Further,  note  that  if  (t,  p ) is  a solution  of  (A.  2.  5),  then  (s-t, ) 

SI  t 

is  also  a solution  and 


i-(s-t)p 


Gl(t’  ^ = Gils-t»  t 
Thus,  we  can  reduce  (A.  2. 10)  to 


si 


G (s)  = max  max  G.  (t,  p ) . 
s/2  <t  <s-l  1=1, 2 si 


Hence  one  can  determine  the  maximum  by  evaluating  Gj(t,  pg^)  for  s-1 
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f 


choices  of  (t,  p ) . However,  an  exact  computation  is  possible.  Hence 
sx 

we  proceed  further. 

Note  that  by  using  (A.  2. 1),  we  have,  for  is  1,2  ; 


fl-(s-t)p  \ /l-(s-t)p 

G(t,  pgl)  = r-Si)  logl 7 — =-)  + (s-t)p„4  log^  p 


si 


si 


[l-(s-t)p 


si 


log) 


Ms-t>psl  | 2 

t H Ms-tlp^logp^) 


= (l-(s-t)pgi)(s-t)  Psi(log(l-(s-t)pgi)  - log(tp£(.)) 


Then,  since  pg^  is  a solution  of  equation  (A.  2.  9),  we  have 


log(l-(s-t)pgl)  - log(tpgi)  = 2/(l-2(s-t)pgi)  , 


hence 

4(s-t)  p (l-(s-t)p  ) 

(A.  2. 11)  G,(t,  p .)  - i = 1,  2 . 

(l-2(s-t)pgir 

Consequently,  we  define 


(A.  2. 12)  G-(p)=-^%^,  0 < p < 1 , 

(1-2P) 


where  p(  - ( s-t)pgi  . Thus  G^p^  = Gj(t,  pgi),  i = 1, 2 . Clearly  G2(p) 
is  symmetric  about  p=  1/2  . Further,  G^(p)  is  increasing  for  0 < p < 1/2 
and  decreasing  for  1/2  < p < 1 . Consequently 

max  G^t,  pgi)  = G^t,  p*)  = G2(p*)  , 
i- 1,  2 
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where  p = p2  if  p2  - 1/2  <1/2  - p^  and  p^  otherwise  and  Pg  = p /(s-t)  . 

% 

We  now  show  that  p = p2  . 

We  transform  (A.  2.  9)  similarly,  obtaining 


(A.  2. 13) 


r(s-t)(l-p)  ,l-2p  2 

[- — f—1]  = e > 0 < p<  1 


If  t = s/2,  then  the  left  hand  side  of  (A.  2. 13)  is  symmetric  about  p = . 5 
and  consequently  p2  - 1/2  = 1/2  - p^  in  that  case.  Further,  since  t > s/2  , 
for  p < 1/2  , 


(A.  2. 14) 
and  for  p > 1/2 


1-P  l-2p  (s-t)(l-p)  l-2p 
1 P ' ~ 1 tp  ’ 


(A.  2.15) 


,1~P  l-2p  ,(s-t)(l-p)  J-2p 

(— ) i ( rP  > 


Thus  in  general,  p_  - 1/2  < 1/2  - p and  p = p . Hence  p = p , . 

L l C S S<-> 

Consequently,  in  (A.  2. 12)  and  (A.  2. 13),  we  can  restrict  attention  to  the  region 
p > 1/2  . Thus,  we  have  shown  that 

G ' (s)  = max  G(t,  p ) . 
s/2<t<s-l  S 

Further,  note  that  (A.  2. 13)  depends  on  t and  s only  through  (s-t)/t  . 

Now  let  s and  p > 1/2  be  fixed  and  consider  t^,  t2  with  s/2  < tj  < t2  < s-1  . 


Then 


•jh  SA 

*1  ” *2 
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and  hence 


Appendix  3 


In  this  appendix  we  justify  some  of  the  app  roximations 
used  in  the  quadrature  method.  In  the  subsequent  discussion  t 
and  X will  be  given  real  numbers  with  0<t  < 1/2  < X < 1 and 
t + X < 1.  We  now  establish  the  following  computational  lemmas. 


Lemma  A.  3.1.  Let  N -*•  «.  Then  for  any  integer  r > 0,  and  any 

T 

c > 0 , and  r < c N , 

N to1"  >T  -I 

(A.  3.1)  Cr)  = ~ (1+  0(KT  *))  . 

The  proof  of  this  lemma  is  trivial  and  therefore  omitted. 


- X, 

Lemma  A.  3.  Z.  For  r<  cN  and  p < cN  , as  N-»oo> 

(A.  3.  Z)  (1-P)N"F  = e’Np(l  + 0(N1“2X))  . 

The  proof  of  this  lemma  is  trivial  and  therefore  omitted. 

Lemma  A.  3.  3.  For  r < c N and  p > c N , for  every  e > 0 

there  is  an  N sufficiently  large  so  that  for  N > N 
e — e 


(A.  3.  3) 


.N4  r„  .N-r  -N 
(f)P  (1-P)  < e 


.l-X-6 


and 


(A.  3. 4) 


IHEli  -Np  • 

r!  — 


The  proof  cf  this  lemma  is  trivial  and  therefore  omitted. 
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Lemma  A.  3.4.  If  nf  is  the  number  of  clarses  occurring  r 
times  then 


00 


E(n>  = £ (V<l-P.> 
1=1  1 ' 


N-r 


Proof.  Let  2^  = 1 if  the  jth  class  occurs  r times  in  the  N 


00 


observations  and  Z = 0 otherwise,  Then 

0C  00  ’ 00  00 


nr  = Z zj  and 

En  = E 1 Z = £ EZ  = £ P{2,  =0  = 2 • 

r j=l  J j=l  1 J=1  J J=1  1 1 

Combining  al*  of  the  above,  we  have  the  following  theorem. 


Theorem  A.  3.1.  Let  t and  X.  be  given  real  numbers  with 
0<t<-^<\<1  and  t + X < 1.  Then  given  a random  sample 
(Xj,  X2>  . . . , Xjj)  of  N observations  from  the  population  with 
P(Xj€  Mj)  = p^,  J = 1,  2,  . . . , »,  and  if  nf  is  the  number  of  cells 
such  that  exactly  t X|s  € then 


00 


N_r„  ,N-r 


E(nr)=Z  <r>p[(1-p,> 


M 


and  for  r < N and  for  every  c > 0 there  is  an  N such  that  for  N > N 

t — 

Un  ) = Tj  T^-e'^iU  + °^~ ^ » + ) • 

Pj<N'X 

Proof.  There  are  at  most  NX  cells  with  >N"X;  hence  from 


Lemma  A.  3.  3 and  A.  3. 4, 


_X  -d-X-t 


ZN  r N-r  X -N*  " ' -N*"  * 

(yJPjd-Pj)  < N e < e N 


„-X 

JJPj>^ 
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The  first  term  is  direct  from  Lemmas  A.  3. 1 and  A.  3.  2. 


We  now  obtain; 


r.ieoren.  A.  3.  2.  For  t and  X such  that  0 < t < — < \ 


t + X < 1,  we  have 


t X (NPi) 

E(n  ) = (1  + 0{N  ,)£  -—! 

Pj<N”X  ’ 


-NP 
re  j 


+ 0(e 


■ N 


Proof.  We  utilize  the  easily  established  fact  that  if  a 

a a. 

positive  numbers,  i = 1,  2,  . . . , and  ~ , i = 1, 

0 bi 

ao  2ai 

> 


bo"  “bi  ' 


-X 


Then  for  < N , r £ N , we 


have 


,N.  r..  N-r 

r)Pi(1"P1)  _ N(N-l). . . (N-r+1)  < eNT ' 

(Np.)1"  -Np  Mr 


J-  e 


r ! 


Thus 


N.  r#1  _ ,N-r 


V5  Nr 

Z ^(r)Pja-P,) 


P,<N 


= 1 + 0(N 


7 -X 


(Np } -Np 


j 


y — l 

“ XT-X  r • 

P^N 

The  conclusion  follows  from  Theorem  A.  3.1. 


< 1 and 
-X -€ 

) . 

L,  b.  are 
2,  . . . then 


= 1+0(NT'X) 
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Appendix  4 


In  this  appendix  we  provide  two  examples  of  populations  for 
which  H = oo. 


00 


Example  1.  Since  Z 


1 


= c < * . Let 


j=l  (j+l)log  (j+I) 


Pj  = l/c(j+l)log  (j+1),  j = 1,  2,  ...  . Then  log  p^  = -log  c 
-log(j+l)  - 2 log  log  (j+1) . Then 


- 7 d loa  d - Y i°.9  c t-ipgU+U  + z iog.iog(J+ij . y i 

L * 1 J ottWo+tt  - f C<)+1)  l09,,t1'  ' 

Example  2.  Let  be  the  smallest  non-negative  integer  such  that 


m 


2k  k m 

k>y~  _ k>l.  Let  MQ  = 0,  = Z 2 . Define 

-k-m,, 


Pj  = 2 k for  j < i < M^,  i an  integer.  Thus  0 < Pj  < 1, 

i > 1» 


00 


M, 


m, 

^ k oo 


z pj  = z zk  thz~  = z -rtr * z \ =i- 

i=l  k=l  i=M]c_1+l  2 k k— 1 2 k k=l  2 


Using  logarithms  base  2,  we  have 

M 

Z = Z Zf  -Pi  log  Pt  = Z 

i=l  ^ k=l  i=M,  ,+l  1 1 k=l 


m. 


2 (k+m^) 

k+m. 


oo  k+m. 


■Z 


k-1  2 


oo  k +(T“  - k)  oo 

Z — £ — -Z  i-  -• 


k=l 


V Z.J  V 

2K  k=l  * 
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